对新生儿的运动和姿势评估使经验丰富的儿科医生可以预测神经发育障碍,从而可以早期干预相关疾病。但是,大多数用于人类姿势估计方法的最新AI方法都集中在成年人上,缺乏公开基准的婴儿姿势估计。在本文中,我们通过提出婴儿姿势数据集和深度聚合视觉变压器来填补这一空白,以进行人姿势估计,该姿势估计引入了一个快速训练的完整变压器框架,而无需使用卷积操作在早期阶段提取功能。它将变压器 + MLP概括为特征图内的高分辨率深层聚集,从而在不同视力级别之间实现信息融合。我们在可可姿势数据集上预先训练,并将其应用于新发布的大规模婴儿姿势估计数据集。结果表明,凝集可以有效地学习不同分辨率之间的多尺度特征,并显着提高婴儿姿势估计的性能。我们表明,在婴儿姿势估计数据集中,凝集优于混合模型hrformer和tokenpose。此外,在可可瓣姿势估计上,我们的凝集表现优于0.8 AP。我们的代码可在github.com/szar-lab/aggpose上获得。
translated by 谷歌翻译
人工智能(AI)为简化Covid-19诊断提供了有前景的替代。然而,涉及周围的安全和可信度的担忧阻碍了大规模代表性的医学数据,对临床实践中训练广泛的模型造成了相当大的挑战。为了解决这个问题,我们启动了统一的CT-Covid AI诊断计划(UCADI),其中AI模型可以在没有数据共享的联合学习框架(FL)下在每个主机机构下分发和独立地在没有数据共享的情况下在每个主机机构上执行。在这里,我们认为我们的FL模型通过大的产量(中国测试敏感性/特异性:0.973 / 0.951,英国:0.730 / 0.942),与专业放射科医师的面板实现可比性表现。我们进一步评估了持有的模型(从另外两家医院收集,留出FL)和异构(用造影材料获取)数据,提供了模型所做的决策的视觉解释,并分析了模型之间的权衡联邦培训过程中的性能和沟通成本。我们的研究基于来自位于中国和英国的23家医院的3,336名患者的9,573次胸部计算断层扫描扫描(CTS)。统称,我们的工作提出了利用联邦学习的潜在保留了数字健康的前景。
translated by 谷歌翻译
变压器已成为自然语言处理(NLP)字段中的De-Facto标准。他们也在计算机视觉和其他域中获得了势头。变形金刚可以使人工智能(AI)模型能够动态地关注其输入的某些部分,因此更有效地关注某些部分。灵感来自变形金刚的成功,我们采用了这种技术来预测在多个视野中的战略飞行偏离需求。这项工作是为了支持斜切式的移动应用程序,PAIR,将预测的偏离需求显示为通用航空(GA)飞行运营商,因此他们可以更好地了解繁忙时期离开延误潜力的意识。涉及Pacer以前设计的基于规则的预测方法的现场示范表明,离职需求的预测准确性仍然具有改进的空间。本研究致力于提高来自两个关键方面的预测精度:更好的数据源和鲁棒预测算法。我们利用了两个数据来源,航空系统性能指标(ASPM)和系统广播信息管理(游泳)作为我们的输入。然后,我们用时间融合变压器(TFT)接受了预测的预测模型,用于五个不同的机场。案例研究表明,TFT通过大幅度的传统预测方法可以更好地表现优于传统的预测方法,它们可以在各种机场和更好的解释性方面导致更好的预测。
translated by 谷歌翻译
机场性能预测具有合理的展示期限是一个具有挑战性的任务,并且已经通过各种先前研究进行了尝试。交通,需求,天气和交通管理行动是任何预测模型的关键输入。本文提出了一种基于时间融合变压器(TFT)的新方法,以预测多次机场的偏离和到达延迟。这种方法可以捕获预测时已知的输入的复杂时间动态,然后预测未来4小时的选定延迟度量。在处理天气投入时,开发了一种自我监督的学习(SSL)模型以将高维天气数据编码为更低的尺寸表示,以更有效和有效地培训TFT。初始结果表明,基于TFT的延迟预测模型通过测试数据集上的较小预测误差来实现令人满意的性能。此外,模型输出的解释性分析识别延迟预测的重要输入因子。预计拟议的方法有望帮助空中交通管理人员或决策者对延误缓解的交通管理行动以及运作,提供足够的提前时间来规划预测性能下降。
translated by 谷歌翻译
受到自然语言处理(NLP)中深度学习(DL)的成功的启发,我们应用了尖端DL技术,以预测战略时间地平线(4小时或更长时间)的飞行偏离需求。这项工作是为了支持斜切式的移动应用程序,PAIR,将预测的偏离需求显示为通用航空(GA)飞行运营商,因此他们可以更好地了解繁忙时期离开延误潜力的意识。涉及Pacer以前设计的基于规则的预测方法的现场示范表明,离职需求的预测准确性仍然具有改进的空间。本研究致力于提高来自两个关键方面的预测精度:更好的数据源和鲁棒预测算法。我们利用了两个数据来源,航空系统性能指标(ASPM)和系统广播信息管理(游泳)作为我们的输入。然后,我们用DL序列技术培训了预测模型,以序列(SEQ2Seq)和SEQ2Seq的注意力。案例研究表明,我们的SEQ2Seq在测试的四种预测算法中具有最佳。此外,与经典的自回归(AR)预测方法相比,通过更好的数据源,SEQ2Seq的注意力可以减少超过60%的平均平方误差(MSE)超过60%。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译